The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Collaborative filtering is a major technique to make personalized recommendations about information items (movies, books, webpages etc) to individual users. In the literature, a common research objective is to predict unknown ratings of items for a user, on the condition that the user has explicitly rated a certain amount of items. Nevertheless, in many practical situations, we may only have implicit...
In this paper, we propose an efficient text classification method using term projection. Firstly, we use a modified χ2 statistic to project terms into predefined categories, which is more efficient compared to other clustering methods. Afterwards, we utilize the generated clusters as features to represent the documents. The classification is then performed in a rule-based manner or via...
The structure of linked documents is dynamic and keeps on changing. Even though different methods have been proposed to exploit the link structure in identifying hubs and authorities in a set of linked documents, no existing approach can effectively deal with its changing situation. This paper explores changes in linked documents and proposes an incremental link probabilistic framework, which we call...
Because users hardly have patience of affording enough labeled data, personalized filter is expected to converge much faster. Topic model based dimension reduction can minimize the structural risk with limited training data. In this paper, we propose a novel supervised dual-PLSA which estimate topics with many kinds of observable data, i.e. labeled and unlabeled documents, supervised information about...
This paper presents an interactive content-based image retrieval framework—uInteract, for delivering a novel four-factor user interaction model visually. The four-factor user interaction model is an interactive relevance feedback mechanism that we proposed, aiming to improve the interaction between users and the CBIR system and in turn users overall search experience. In this paper, we present how...
When ranking texts retrieved for a query, semantics of each term t in the texts is a fundamental basis. The semantics often depends on locality context (neighboring) terms of t in the texts. In this paper, we present a technique CTFA4TR that improves text rankers by encoding the term locality contexts to the assessment of term frequency (TF) of each term in the texts. Results of the TF assessment...
Bootstrapping is a weakly supervised algorithm that has been the focus of attention in many Information Extraction(IE) and Natural Language Processing(NLP) fields, especially in learning semantic lexicons. In this paper, we propose a new bootstrapping algorithm called Mutual Screening Graph Algorithm (MSGA) to learn semantic lexicons. The approach uses only unannotated corpus and a few of seed words...
In this paper, we propose a method for image retrieval on the web. In this task, we focus on abstract words that do not directly link to images that we want. For example, a user might use a query “summer” to retrieve images of “fireworks” or “a white sand beach with the sea”. In this case retrieval systems need to infer direct words for the images from the abstract query of the user. In our method,...
Nowadays, we are faced with finding “trustworthy” answers not only “relevant” answers. This paper proposes a QA model based on answer trustworthiness. Contrary to the past researches which focused simple trust factors of a document, we identified three different answer trustworthiness factors: 1) incorporating document quality at the document layer; 2) representing the authority and reputation of...
Opinion retrieval is a novel information retrieval task and has attracted a great deal of attention with the rapid increase of online opinionated information. Most previous work adopts the classical two stage framework, i.e., first retrieving topic relevant documents and then re-ranking them according to opinion relevance. However, none has considered the problem of domain coherence between queries...
Learning to rank has become a hot issue in the community of information retrieval. It combines the relevance judgment information with the approaches of both in information retrieval and machine learning, so as to learn a more accurate ranking function for retrieval. Most previous approaches only rely on the labeled relevance information provided, thus suffering from the limited training data size...
Opinion mining systems suffer a great loss when unknown opinion targets constantly appear in newly composed reviews. Previous opinion target extraction methods typically consider human-compiled opinion targets as seeds and adopt syntactic/statistic patterns to extract opinion targets. Three problems are worth noting. First, the manually defined opinion targets are too large to be good seeds. Second,...
Relevance evaluation is an important topic in Web search engine research. Traditional evaluation methods resort to huge amount of human efforts which lead to an extremely time-consuming process in practice. With analysis on large scale user query logs and click-through data, we propose a performance evaluation method that fully automatically generates large scale Web search topics and answer sets...
Traditional clustering algorithms often suffer from model misfit problem when the distribution of real data does not fit the model assumptions. To address this problem, we propose a novel clustering framework based on adaptive space mapping and rescaling, referred as M-R framework. The basic idea of our approach is to adjust the data representation to make the data distribution fit the model assumptions...
Cross-Language Information Retrieval (CLIR) combines the traditional Information Retrieval technique and Machine Translation technique. There are many aspects related to the problem of polysemy, which are good cut-in points for the application of WSD in CLIR. Therefore, an attempt in this paper is to apply WSD in English-Chinese Bi-Directional CLIR. The query expansion and the proposed Lesk-C WSD...
Availability of enormous number of digital music presents challenge to organize and retrieve it in an effective way. We explore polyphonic Indonesian folksongs retrieval based on pattern matching such as n-gram in searching the songs. We compare the pattern matching results to regular text-based information retrieval system. The folksongs are either fully or partially indexed. The results of the experiments...
User behavior information analysis has been shown important for optimization and evaluation of Web search and has become one of the major areas in both information retrieval and knowledge management researches. This paper focuses on users’ searching behavior reliability study based on large scale query and click-through logs collected from commercial search engines. The concept of reliability is defined...
Due to the ease of use in blogs, this new form of web content has become a popular online media. Detecting the popularity of blogs in the massive blogosphere is a critical issue. General search engines that ignore the social interconnection between bloggers have less discrimination of blogs. This study extracts real-world blog data and analyzes the interconnection in these blog communities for blog...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.